Recognition Using Classification and Segmentation Scoring

نویسندگان

  • Owen Kimball
  • Mari Ostendorf
  • Jan Robin Rohlicek
چکیده

Traditional statistical speech recognition systems typically make strong assumptions about the independence of observation frames and generally do not make use of segmental information. In contrast, when the segmentation is known, existing classifiers can readily accommodate segmental information in the decision process. We describe an approach to connected word recognition that allows the use of segmental information through an explicit decomposition of the recognition criterion into classification and segmentation scoring. Preliminary experiments are presented, demonstrating that the proposed framework, using fixed length sequences of cepstral feature vectors for classification of individual phonemes, performs comparably to more traditional recognition approaches that use the entire observation sequence. We expect that performance gain can be obtained using this structure with additional, more general features. 1. I N T R O D U C T I O N Although hidden-Markov-model (HMM) based speech recognition systems have achieved very high performance, it may be possible to improve on their performance by addressing the known deficits of the HMM. Perhaps the most obvious weaknesses of the model are the reliance on frame-based feature extraction and the assumption of conditional independence of these features given an underlying state sequence. The assumption of independence disagrees with what is known of the actual speech signal, and when this framework is accepted, it is difficult to incorporate potentially useful measurements made across an entire segment of speech. Much of the linguistic knowledge of acoustic-phonetic properties of speech is most naturally expressed in such segmental measurements, and the inability to use such measurements may represent a significant loss in potential performance. In an a t tempt to address this issue, a number of models have been proposed that use segmental features as the basis of recognition. Although these models allow the use of segmental measurements, they have not yet achieved significant performance gains over HMMs *This research was jointly funded by NSF and DARPA under NSF grant number IRI-8902124. because of difficulties associated with modeling a variable length observation with segmental features. Many of these models represent the segmental characteristics as a fixed-dimensional vector of features derived from the variable-length observation sequence. Although such features may work quite well for classification of individual units, such as phonemes or syllables, it is less obvious how to use fixed-length features to score a sequence of these units where the number and location of the units is not known. For example, simply taking the product of independent phoneme classification probabilities using fixed length measurements is inadequate. If this is done, the total number of observations used for an utterance is F x N, where F is the fixed number of features per segment and N is the number of phonemes in the hypothesized sentence. As a result, the scores for hypotheses with different numbers of phonemes will effectively be computed over different dimensional probabili ty spaces, and as such, will not be comparable. In particular, long segments will have lower costs per frame than short segments. In this paper, we address the segment modeling problem using an approach that decomposes the recognition process into a segment classification problem and a segmentation scoring problem. The explicit use of a classification component allows the direct use of segmental measures as well as a variety of classification techniques that are not readily accommodated with other formulations. The segmentation score component effectively normalizes the scores of different length sequences, making them comparable. 2. C L A S S I F I C A T I O N A N D S E G M E N T A T I O N S C O R I N G 2.1. G e n e r a l M o d e l The goal of speech recognition systems is to find the most likely label sequence, A = al , ..., air given a sequence of acoustic observations, X. For simplicity, we can restrict the problem to finding the label sequence, A, and segmentation, $ = s l , . . . , S N , tha t have the highest joint likelihood given the observations. (There is typically no

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modified CLPSO-based fuzzy classification System: Color Image Segmentation

Fuzzy segmentation is an effective way of segmenting out objects in images containing both random noise and varying illumination. In this paper, a modified method based on the Comprehensive Learning Particle Swarm Optimization (CLPSO) is proposed for pixel classification in HSI color space by selecting a fuzzy classification system with minimum number of fuzzy rules and minimum number of incorr...

متن کامل

A New IRIS Segmentation Method Based on Sparse Representation

Iris recognition is one of the most reliable methods for identification. In general, itconsists of image acquisition, iris segmentation, feature extraction and matching. Among them, iris segmentation has an important role on the performance of any iris recognition system. Eyes nonlinear movement, occlusion, and specular reflection are main challenges for any iris segmentation method. In thi...

متن کامل

A New IRIS Segmentation Method Based on Sparse Representation

Iris recognition is one of the most reliable methods for identification. In general, itconsists of image acquisition, iris segmentation, feature extraction and matching. Among them, iris segmentation has an important role on the performance of any iris recognition system. Eyes nonlinear movement, occlusion, and specular reflection are main challenges for any iris segmentation method. In thi...

متن کامل

Prostate segmentation and lesions classification in CT images using Mask R-CNN

Purpose: Non-cancerous prostate lesions such as prostate calcification, prostate enlargement, and prostate inflammation cause too many problems for men’s health. This research proposes a novel approach, a combination of image processing techniques and deep learning methods for classification and segmentation of the prostate in CT-scan images by considering the experienced physicians’ reports. ...

متن کامل

Plant Classification in Images of Natural Scenes Using Segmentations Fusion

This paper presents a novel approach to automatic classifying and identifying of tree leaves using image segmentation fusion. With the development of mobile devices and remote access, automatic plant identification in images taken in natural scenes has received much attention. Image segmentation plays a key role in most plant identification methods, especially in complex background images. Wher...

متن کامل

Automated Tumor Segmentation Based on Hidden Markov Classifier using Singular Value Decomposition Feature Extraction in Brain MR images

ntroduction: Diagnosing brain tumor is not always easy for doctors, and existence of an assistant that                                                      facilitates the interpretation process is an asset in the clinic. Computer vision techniques are devised to aid the clinic in detecting tumors based on a database of tumor c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992